Goto

Collaborating Authors

 massively multilingual nlu 2022


Amazon Releases 51-Language Dataset for Language Understanding

#artificialintelligence

Imagine that all people around the world could use voice AI systems such as Alexa in their native tongues. One promising approach to realizing this vision is massively multilingual natural-language understanding (MMNLU), a paradigm in which a single machine learning model can parse and understand inputs from many typologically diverse languages. By learning a shared data representation that spans languages, the model can transfer knowledge from languages with abundant training data to those in which training data is scarce. Today we are pleased to make three announcements related to MMNLU. First, we are releasing a new dataset called MASSIVE, which is composed of one million labeled utterances spanning 51 languages, along with open-source code that provides examples of how to perform massively multilingual NLU modeling and allows practitioners to re-create the baseline results presented in our paper.. Second, we are launching a new competition using the MASSIVE dataset called Massively Multilingual NLU 2022 (MMNLU-22).


Amazon Kickstarts Natural Language Understanding By Open-Sourcing 'MASSIVE' Speech Dataset

#artificialintelligence

To scale natural language understanding to every spoken language on Earth, Amazon.Inc has announced the release of its open-source'MASSIVE' speech dataset. The main goal of curating such a dataset was to assist researchers in developing virtual assistants that could easily be generalized to some of the world's most hidden languages. In addition to the database, Amazon has also published open-source modeling code to help developers create more capable virtual assistants. Several new technological breakthroughs in speech recognition and natural language understanding (NLU) have opened the way for voice-activated digital assistants such as Siri, Bixby, and Google Assistant. The primary shortcoming of these voice-controlled personal assistants is that they are only available in a few familiar languages.